This is a corporate work between the Technical University of Darmstadt and the Forschungszentrum
in Jülich.
You can see that Markus, Alexander and me and Felix are from the Technical University
of Darmstadt, while Nour and Bernd are from the Forschungszentrum Jülich.
To be precise, we actually presented part of this work at the Pro Tools workshop during
the SC22 in Dallas last year.
And what you will see here is basically a bit more extended version and a few more aspects
about the future work.
So let's start with motivation.
So as you know, the performance and complexity of HPC systems is becoming more and more complex
and the applications are becoming also more complex.
So it's basically very important to identify performance bottleneck at an early stage.
And this is basically our motivation.
So what you usually do is that you use performance modeling, which has quite a long research
history.
And it's actually very good to predict the scaling behavior of an application and thus
allows you to identify performance bottleneck at an early stage.
But the problem is that if you perform or the performance models basically depend on
the measurements and on noisy environments, you basically have very much noise in the
measurements.
So you have strong variations in the measurements and the measurements also become irreducible
and misleading.
So you can think if you use these measurements to generate performance models, the performance
would basically deviate strongly from what the real behavior of the application is.
So this is something we don't want.
And what we actually want is a performance model that actually describes the scaling
behavior of the application.
So let's look a bit more into this topic.
So you can see a graph.
Don't worry about it right now.
I will explain it in detail at the end of the presentation.
But what you can see is on the x-axis, you have the relative deviation from the mean.
And here on the y-axis, you have, for example, here the time.
And how you can think about it is that we had repeated runs several times and then a
kind of deviation from the mean.
And the further away these values are, the more deviations there is actually.
And you can see here two kinds of plots.
You can see a blue one and an orange one here.
And what is interesting is that the blue one actually is very short, as you can see here,
while the orange one has quite some deviation.
And this means that it's basically, so the metric to time is basically, which is here
on the y-axis, strongly affected by noise.
So you can think about if you use this metric to generate performance models, you can see
that there will be a lot of deviations.
So what we thought of is why don't we use hardware counters?
Because hardware counters are basically a little impacted by noise.
And this is something you can see here also on the y-axis.
So you have here the double precision operations.
And very interesting to see here is that in the presence of noise or no noise, it's basically
nearly the same.
Presenters
Zugänglich über
Offener Zugang
Dauer
00:37:18 Min
Aufnahmedatum
2023-04-18
Hochgeladen am
2023-04-21 16:46:05
Sprache
en-US